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(54) Machine translation system 

(57) Source language text from an input interface 
(100) is broken down into source language morphemes 
by a morphological analyser (101). A syntactic analyser 
(102) converts the morphemes into source language 
signs labelled with identifiers and data identifying other 
signs which are grammatically related. A bilingual equiv- 
alence transformer (103) transforms the source lan- 
guage signs to target language signs which are com- 
bined by a combiner (104) to provide a first attempt at 
a target language structure. The structure is repeatedly 
evaluated by an evaluator (104) and transformed by a 
transformer (106). The signs of well formed substruc- 
tures identified by the evaluator (104) are not dissociat- 
ed from each other by the transformer. This process 
ends when either the whole target language structure is 
evaluated as being well formed or all transformations 
have been unsuccessfully evaluated. 
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D scription 

The present invention relates to a machine transla- 
tion system. 

There are several known machine translation sys- 
tems which are based on the "lexical-semantic transfer" 
approach disclosed in Whitelock, Proc. of COLING-92, 
Aug. 23-28, 1992, "Shake-and-Bake Translation". All of 
these may be conceptually segmented in terms of a 
computing system which takes a sentence as input. The 
input is passed into modules which convert the funda- 
mental linguistic elements from their original language 
into the language in which they are to be translated and 
reassemblies them in a grammatical manner. On suc- 
cessful reassembly, the translated sentence is extracted 
from the translated language structure and output. The 
various modules are as follows: 

1. A "parsing module" analyses the sentence to be 
translated (the source sentence) and extracts the 
resulting lexical items or lexical signs (items from a 
dictionary for the source language made grammat- 
ically more specific by the analysis just performed). 

2. A "transfer module" translates the source lan- 
guage lexical items into sets of lexical items in the 
target language. For this process to work, some of 
the critical semantic information inferred from the 
source analysis must also be maintained between 
the target signs. This is the origin of the term "lexi- 
cal-semantic transfer". 

3. A "generation module" reduces the collection of 
target signs into a grammatical structure by trying 
to reduce arbitrary combinations of them by either: 

(a) Producing arbitrary permutations of a struc- 
ture that might fit the target-language lexical 
signs because one of the structures should 
eventually be correct (this is also known as 
"generate-and-test"). 

(b) Eliminating impossible structures by a sys- 
tem of constraints. 

However, both of these approaches are undi- 
rected in that there is no systematic means of as- 
sembling a target language structure. It is this arbi- 
trary aspect of their operation that makes them 
computationally prohibitively expensive to use for 
general translation. 

4. If the generation module succeeds in producing 
a grammatical structure, an "output module" ex- 
tracts its orthography (spelling), which has been ob- 
tained using the various grammatical rules or con- 
straints applied in the previous step, giving the 
translated sentence (the target sentence). 



A problem with the generate-and-test technique, as 
mentioned above, is that it can require a large amount 
of processing time in order either to arrive at a correct 
translation or to exhaust all the possible structures and 
5 give up. For instance, where there are X target signs to 
be formed into a grammatical structure, the system will 
try all possible permutations of these signs. For many 
source language sentences, a correct structure will be 
found after a reasonable amount of processing time. 

10 However, for many sentences, a large proportion of all 
the possible permutations will be tried before a correct 
structure is derived. For some sentences which the sys- 
tem is incapable of translating, all of the permutations 
will have to be tried before the system admits defeat and 

?5 moves on to another sentence. In such cases, the 
number of permutations is X! (factorial X). For sentenc- 
es where X is a relatively small number, for instance of 
the order of five or six, this does not represent a disad- 
vantage. However, for source language sentences giv- 

20 jng rise to, for instance, ten or more target language 
signs, such systems will not admit failure until millions 
of attempts have been made. This results in the system 
becoming intractable when embodied by currently avail- 
able data processing systems because the processing 

2S speeds of such systems are insufficient to allow trans- 
lation to be performed within a viable time frame. For 
complex source language sentences, the required 
processing time before admitting failure may become 
days, years or even more millennia than the anticipated 

30 |jfe of the universe. 

EP 0 568 31 9 discloses an arrangement which rep- 
resents a development of the basic "shake-and-bake" 
machine translation system. This arrangement identi- 
fies all possible pair-wise combinations of the target lan- 

55 guage signs so as to form a set of relationships. The 
system then explores the ways in which the pair-wise 
combinations can be assembled into a layer structure. 
Any structure which fails, for instance because not all of 
the signs are used in the structure, is rejected and a to- 

40 tally new structure is tried. 

According to the present invention, there is provid- 
ed a machine translation system as defined in the ap- 
pended Claim 1. 

Preferred embodiments of the invention are defined 

45 jn the other appended claims. 

The term "grammatically complete section of text" 
means any section of test which is essentially complete 
in itself. Generally, this will be a sentence but alterna- 
tives include a clause or a phrase. 

50 It is thus possible to provide a machine translation 
system which is capable of being embodied by currently 
available data processing systems and which is a trac- 
table system. In other words, the system will either suc- 
ceed in translating, for instance, a sentence or will ex- 

55 piore all possible target language structures and give up 
in a reasonable time. For instance, for X target language 
signs, the maximum number of transformations which 
will be performed before giving up is believed to be less 
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than X 3 or a reasonably low order polynomial in X. In 
practice, far fewer transformations than this may well be 
sufficient. Thus, the possibility of the system effectiv ly 
becoming locked in an undesirable mode can be pre- 
vented. 

The system provides improved efficiency by pre- 
serving well formed substructures in the sense that such 
substructures can be added to if appropriate but one not 
broken up after being tried. Thus, convergence to a 
complete translation, when such is possible, will occur 
more quickly than with known systems, for instance of 
the type disclosed in EP 0 568 319. 

The invention will be further described, by way of 
example, with reference to the accompanying drawings, 
in which: 

Figure 1 is a block schematic diagram of a machine 
translation system constituting a preferred embod- 
iment of the invention; and 

Figures 2 to 15 illustrate information produced dur- 
ing operation of the system of Figure 1. 

The machine translation system shown in Figure 1 
may be embodied by hardware which is dedicated to 
performing the operations which will be described here- 
inafter. However, in general, the system will be embod- 
ied by a programmable data processor controlled by 
suitable software, for instance stored in semiconductor 
or magnetic memory. From the drawings and following 
description, the person skilled in the art will readily be 
able to make dedicated hardware or write dedicated 
software for controlling programmable hardware. 

The system shown in Figure 1 comprises an input 
interface 100 which permits text in a source language 
to be entered into the system. For instance, the input 
interface 100 may comprise a keyboard or magnetic 
disc reader. The output of the interface is connected to 
a morphological analyzer 101 which analyses the input 
text into the most basic language units, which are known 
as morphemes and which comprise base forms of the 
words and affixes (prefixes and suffixes) which modify 
the base forms. 

The morphemes Irom the analyzer 101 are supplied 
to a syntactic analyzer or parser 102 which applies rules 
of the source language grammar to the morphemes so 
as to define the grammatical relationships between the 
morphemes. This information is supplied to a bilingual 
equivalence transformer 103. In addition, the analyzer 
102 derives a "tree" which defines how the morphemes 
were combined in the source language. 

The morphemes together with their associated data 
are known as "signs" and are supplied to the bilingual 
equivalence transformer 103. The transformer 103 ap- 
plies bilingual equivalence rules which cause each 
source language sign to be replaced by an equivalent 
target language sign such that each source language 
morpheme is transformed to its quivalent target lan- 



guage morpheme and th grammatical data of each 
source language sign is transformed into corresponding 
grammatical data for the target language. The trans- 
former 103 thus produces output language signs which 
5 are supplied to a combiner 104. 

The combiner 104 combines the target language 
signs so as to make an initial attempt at forming the tar- 
get language equivalent of the input text. The combiner 
defines a target language tree (a parsing tree) which 
may be of any predetermined or random structure. How- 
ever, because parsing trees in many languages have 
substantial similarities, the combiner 104 preferably 
makes use of the source language parsing tree from the 
analyzer 102 to make a first attempt at the target lan- 
guage text. 

The linguistic structure defined by the source lan- 
guage parsing tree and the target language signs is sup- 
plied to an evaluator 105 which evaluates the validity of 
the first attempt at the target language text by applying 
a set of target language grammar rules lo the signs. If 
the evaluation is successful, the textual information is 
passed to an output interface 107 which supplies the 
output text of the system, for instance to a printer, visual 
display unit, or memory. If the evaluation is not success- 
ful, then the structure is transformed in a transformer 
1 06 so as effectively to alter the parsing tree without de- 
stroying any part of the structure which has been eval- 
uated as being correct and re-evaluation is performed 
by the evaluator 105. Each transformation should have 
the effect of improving the structure so that the structure 
converges on a correct target language translation. Al- 
ternatively, if the system cannot produce a correct trans- 
lation, it will fail after a relatively small number of itera- 
tions and pass on to another input sentence. If the sys- 
tem does not fail, this process is repeated until evalua- 
tion is successful and the signs and correct structure can 
be passed to the output interface 107. Thus, the mor- 
phemes are transformed into the correct target lan- 
guage text units i.e. words, placed in the correct target 
language order. 

In order to explain the operation of the system 
shown in Figure 1 more clearly, a specific example will 
be described in detail showing the steps in translating a 
sentence in French (the source language) to the equiv- 
alent sentence in English (the target language). Figure 
2 shows the input text A provided by the input interface 
100 as the French sentence "Le rapide renard brun plait 
au chien". This text is supplied to the morphological an- 
alyzer, which replaces each word of the French text by 
the equivalent morpheme or morphemes. The mor- 
phemes are shown in Figure 3 as the information B sup- 
plied by the analyzer 101 . The morphemes are supplied 
to the syntactic analyzer 102 which performs a parsing 
operation by applying the rules of French grammar so 
as to derive a parsing tr e as illustrated in Figur 4. In- 
dices are allocated to th morphem s, for example, in 
accordance with the order of the morphemes which in 
turn corresponds largely to the order of th words in the 
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French text Thus, the first morpheme (index = 1) is the 
word °Le* the second morpheme (index =2) is the 
French word "rapide". and soon. The finite verb "plait" 
is replaced by two morphemes having indexes 5 and 6. 
The titth morpheme is the infinitive verb "plaire" and the 
sixth morpnemc is "present" to indicate the present 
tense of the verb in tne input text. Similarly, the seventh 
and eighth morpnemes break down the French word 
"au" into "a* and "le" Thus, the fifth, seventh, and eighth 
morphemes represent base forms whereas the sixth 
morpheme represents an affix of the fifth morpheme. 

As a resu t oi the parsing operation which defines 
the syntactic rcintionshios between the morphemes, the 
syntactic anary/er 102 labels each morpheme with cer- 
tain data indicating the relationships between the mor- 
phemes so to p'cOucc the French signs. Figure 5 il- 
lustrates the miorm^.tion p*oduccd by the analyzer 102 
so as to labc several cuncrcnt types of morphemes. 
Thus, for a mofpnemc comprising a French noun, the 
sign comprises m v-io^f no njcattnc that the morpheme 
is a noun, alkxninnj i\> mjcx mtu giving the spelling of 
the morpheme A vet sign i • i stm.-lariy comprises the 
allocated index ana trie spelling ci the morpheme. In ad- 
dition, the sign ncuoci trie index ol the morpheme 
which is the subicc! of the \ cit> .md nc index of the mor- 
pheme which is the ob|oct ot the vorb in the source lan- 
guage (French) 

Signs for adjectfves nod prepositions are shown at 
112 and 113 Farh nt thoso comprises the index of the 
morpheme, its spoiling and the index of the morpheme 
to which it is gramme tc^tty related 

Figure 6 illustrates :ne signs D produced by the an- 
alyzer 102 correspon ding to the French sentence shown 
in Figure 2. The t;rst sicn is labelled with index 1 corre- 
sponding to the tirst morpheme It modifies the mor- 
pheme with index 3 and has the spelling "le". The sec- 
ond sign corresponding tc the second morpheme has 
index 2, modifies the third morpheme (index 3), and has 
the spelling "rapide' The third sign corresponding to the 
third morpheme has index 3 and spelling "renard". The 
fourth sign corresponding to the fourth morpheme has 
index 4, modifies morpheme 3 and has spelling "bain". 

The fifth sign has index 5 corresponding to the fifth 
morpheme, is a verb whoso subject is morpheme 3 and 
whose object is morpheme 9 and has the spelling 
"plaire". The sixth sign corresponds to the sixth mor- 
pheme and has index 6 modifies the fifth morpheme, 
and has the spelling "prubunl" indicating the present 
tense. The seventh s>iyn i;onesponds to the seventh 
morpheme and has index 7. modifies morpheme 9, and 
has the spelling "a". The eighth sign has index 8 corre- 
sponding to the index ot the eighth morpheme, modifies 
the ninth morpheme, and has spelling "le". The ninth 
sign has index 9 and has the spelling "chien". 

The French signs are supplied to the transformer 
103 which applies a set of bilingual equivalence rules 
so as to transform the French signs into the equivalent 
English signs. These equivalence rules amount effec- 
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tively to a bilingual dictionary in which the spelling of 
each French sign is replaced by the spelling of the Eng- 
lish sign and the labels of the French signs are replaced 
by the equivalent labels of the English signs. The rules 
s E relating to the signs illustrated in Figure 6 are shown 
in Figure 7. In the case of the rules (i) to (iii) and (v) to 
(viii) each English sign has the same modifier and index 
number as the corresponding French sign so that the 
only difference is in the "spelling label" where the French 
10 morpheme is replaced by the English morpheme. How- 
ever, in the case of rule (iv), the transformer 103 recog- 
nises the fifth and seventh signs illustrated in Figure 6 
and transforms the labels as shown in Figure 7. Thus, 
the infinitive verb "plaire 0 together with the preposition 
is "a" is recognised as corresponding to the English infin- 
itive verb "to like" and the spelling is transformed ac- 
cordingly. However, the rule recognises that, in translat- 
ing between French and English, the subject and object 
have to be reversed. Thus, whereas the French sign 
with index 5 has as its subject and object the mor- 
phemes with indices 3 and 9, respectively, the English 
sign has the morpheme of index 3 as its object and the 
morpheme of index 9 as its subject. The index remains 
unchanged. The other transformations illustrated by rule 
(iv) in Figure 7 are required for translation from English 
to French and need not therefore be further described 
for the purposes of explaining this example. 

Figure 8 shows the English signs F produced by the 
transformer 103 as corresponding to the French signs 
shown in Figure 6. Thus, by applying rule (i) of Figure 7 
to the first French sign shown in Figure 6 (index 1 ), the 
spelling changes from "le" to "the", the index of the Eng- 
lish sign is equal to 1 i.e. the same as the corresponding 
French sign, and the sign modifies the morpheme with 
the index 3 as in the case of the French sign. Similarly, 
English signs 3, 4, 6, 8, and 9 are unchanged in respect 
of their index and the modifier (the index of the mor- 
pheme which each modifies) so that only the spelling 
differs corresponding to the transformation from French 
to English. The fifth sign, as described above, corre- 
sponds to the fifth French sign but with the subject and 
object indices exchanged. 

The combiner 104 takes the English signs shown in 
Figure 8 and combines these in accordance with the 
parsing tree C shown in Figure 4 and derived from the 
French syntax by the analyzer 1 02. In Figure 9, the signs 
are shown as boxes with the labels abbreviated such 
that V stands for "index", "M" stands for "modifies 0 , "S" 
stands for "subject 0 , and "O" stands for "object". The 
French sign with index 7 is not present in Figure 9 since 
the transformer 103 has recognised that this effectively 
forms part of the verb in French and there is no separate 
English sign for this. In other words, the English sign 
with index 5 repr sents both French signs with indie s 
5 and 7. Otherwise, the signs are ffectively arranged 
in order of their indices with the tree structure of Figure 
4 being applied thereto. 

The tree structure illustrated in Figure 9 is of the bi- 
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nary type in which each of the nodes G1 to G7 has two 
branches. Thus, the node G1 may be thought of as a 
"trunk" node having two branches which extend to the 
nodes G2 and G5. The lowest level of nodes, referred 
to as "leaf nodes' comprises the signs themselves. 5 

The information illustrated at G in Figure 9 is sup- 
plied to the evaiuator 105 which evaluates whether the 
structure in the target language (English) is correct. In 
order to do this, the evaiuator applies a set of English 
grammar rules H which are illustrated in Figure 10. In 10 
particular. Figure 10 shows those English grammar 
rules which are sufficient to allow the signs and the 
structure of Figure 9 to be evaluated. In Figure 10, a 
vertical bar indicates an alternative. 

Thus, the first rule effectively states that, if a node is 
is connected to two sub-nodes representing a noun 
phrnse followed by a verb phrase with the subject of the 
verb phrase being identical to the index of the noun 
phrase then the node is well formed and represents a 
sentence The second rule states that, if a node is con- 20 
nected to two sub-nodes representing a determinant fol- 
lowed by a noun "sub-phrase" with the determinant 
modifying the noun phrase, then the index of the noun 
phrase is equal to the index of the noun sub-phrase. In 
other words that node may then be given a label in 25 
which tho index is equal to the index of the noun phrase. 

The third rule states that, if a node is connected to 
one or two subsidiary nodes, then there are two possi- 
bilities for delming the node as well formed and labelling 
it If there is a single subsidiary node which represents 30 
a noun, then the node is well formed and is labelled with 
the same index as the noun. Alternatively, if the node is 
connected to two nodes which represent an adjective 
followed by a noun, and if the adjective modifier is equal 
to the index of the noun (i.e. the adjective modifies that 35 
noun), then the node is well formed and is assigned an 
index equal to the index of the noun. 

The remaining rules are illustrated in Figure 10 to- 
gether with a definition of the abbreviations. Thus, rules 
(a) to (f) are used by the evaiuator 105 to evaluate the 40 
structure illustrated by the parsing tree shown in Figure 
9. 

In addition to the grammar rules illustrated in Figure 
10, the evaiuator 104 applies a further set of rules 
amounting to an algorithm for evaluating the structure 4 & 
shown in Figure 9. Initially the nodes Gl to G7 are la- 
belled as having not been evaluated. If the node has 
already been evaluated and therefore has already been 
labelled with an index, the node is unchanged. If the 
node is a leaf node i.e. has no sub-nodes or "children", so 
it is labelled with the index of the target language sign 
to which it relates. If the node has not previously been 
successfully evaluated, it is evaluated on the basis of 
the labels of its sub-nodes or children. Finally, each eval- 
uation begins at the top node or trunk (G1 in Figure 9). ss 

Applying the algorithm and grammar rules to the 
structure shown in Figure 9, which represents the first 
attempt at the correct structure, the node G1 is found 
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not to have been evaluated and an attempt is made to 
evaluate it on the basis of the children G2 and G5. The 
evaiuator 105 chooses, for sake of argument, the left 
branch and attempts to evaluate the node G2 from the 
children G3 and the leaf node of index 1=1. Although 
the leaf node has been given the index 1 , the node G3 
has not been evaluated. The evaiuator therefore tries to 
evaluate the node G3 from its children and immediately 
notes that the node G4 has not been evaluated. The 
evaiuator therefore attempts to evaluate the node G4. 

As shown in Figure 9 : the node G4 is connected to 
leaf nodes having indices 2 and 3 representing an ad- 
jective followed by a noun, the adjective modifying that 
noun (M = 3 for the adjective and 1 = 3 for the noun). The 
evaiuator applies the grammar rules H shown in Figure 
10 and determines that the second alternative of rule (d) 
applies to the node G4. The node G4 is therefore la- 
belled as being well formed, as representing a noun sub- 
phrase, as having an index equal to that of the noun L 
e. equal to 3, and as having as its spelling the spelling 
of the sign of index 1=2 followed by the spelling of the 
sign of Index I = 3. 

The evaiuator 105 then performs the same algo- 
rithm for the right branch from the node Gl so as to eval- 
uate the nodes G5 to G7. The node G5 has not already 
been evaluated and so the evaiuator attempts to evalu- 
ate it from its children. The evaiuator first determines 
that the node G6 has not been evaluated and attempts 
to evaluate it on the basis of the grammar rules H. The 
children of the node G6 are leaf nodes and comply with 
the rule (f). Thus, the node G6 is labelled as represent- 
ing a finite verb whose index is 5, whose subject is 9, 
whose object is 3, and whose spelling is "likes". 

Having exhausted the left branch from the node G5, 
the evaiuator evaluates the right branch and finds that 
the node G7 has not yet been evaluated. The node G7 
is evaluated from its children, which are leaf nodes and 
which fulfil the rule (b) shown in Figure 10. Thus, the 
node G7 is labelled as a noun phrase with index 9 and 
having as its spelling the spelling of the sign of index 
1=8 followed by the spelling of the sign of index 1=9. 

This completes the initial evaluation by the evaiua- 
tor 105 and labels of the nodes G4, G6 and G7 are 
shown in Figure 11 . Although the nodes G4, G6 and G7 
have been successfully evaluated, the remaining nodes 
could not be evaluated and were therefore labelled as 
being not well formed. The structure illustrated in Figure 
9 is therefore incorrect and this is signalled to the trans- 
former 106. 

The transformer modifies the structure shown in 
Figure 9 for re-evaluation but preserves the structure 
which as already been evaluated as being well formed. 
In particular, if a section of the tree below a node com- 
prises only well lormed nodes but the node itself is not 
well formed, then the section below that node is not dis- 
turbed in the sense of removing any nod s from it, al- 
though nodes may be added to it in subsequent steps. 
Thus, "sub-trees* which are wholly well formed do not 
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need to be evaluated again but, if nodes are added, it is 
merely necessary to evaluate the added nodes. This 
limited re-evaluation is permissible provided the gram- 
mar fulfils certain constraints such that it is "monotonic". 
In this context, the term "monotonic" refers to grammars 
which are such that the structure is always improved on 
evaluation and transformation. Otherwise, when using 
more perverse grammars, partial or full re-evaluation of 
well formed sub-trees may be necessary. By "monoton- 
ic° is meant: (i) that the order of the orthography of two 
combining signs in the orthography of the result must 
be determinate and must not depend on any subse- 
quent combination that the result may undergo; and (ii) 
that if a well-lormed structure which is part of an ill- 
formed second structure becomes associated at the 
highest possible place inside another structure, the re- 
sult will be well-formed after it is re-evaluated by the 
evaluator. 

Thus, a "maximal tree fragment" comprises a well- 
formed tree fragment (i.e. all of its nodes are well 
formed) which is not part of a bigger well-formed frag- 
ment. 

The transformer 106 chooses any maximal tree 
fragment which can combine with some other part of the 
tree. In the example shown in drawings, by testing the 
nodes in an arbitrary order, for instance going from right 
to left, the transformer 106 determines that the sign or 
leaf node with index 4 can be associated with the sign 
or leaf of index 3 because the former modifies the latter. 
The transformer therefore alters the structure in this way 
to produce the tree shown in Figure 1 2. In Figure 1 2, the 
nodes G4, G6, and G7 which were previously success- 
fully evaluated and labelled are shown with their (abbre- 
viated) labels in place. The evaluator 1 05 then evaluates 
the tree shown in Figure 12 using the same grammar 
rules H and the same algorithm as before. Thus, the 
node G8 is not well lormed and neither is the node G9. 
Its left branch connects to a leaf node which is therefore 
well formed and its right branch is connected to the node 
G4 which is well formed. The node G9 can therefore be 
evaluated as it fulfils the rule (b). Thus, the node G9 is 
labelled as being well formed as a noun phrase with in- 
dex 3 and having as its spelling the spelling of the sign 
whose index is 1 followed by the spelling of the node 
G4. The node G10 may then be evaluated, if it has not 
already been evaluated during the transformation by the 
transformer 106, and fulfils rule (d) shown in Figure 10. 
Thus, the node G10 is labelled as a noun sub-phrase of 
index 3. For the sake of simplicity the rules which en- 
sure that the adjectives having indices 2 and 3 appear 
in the correct order are not shown and will not be de- 
scribed. 

The right branch from the node G8 is then evaluated 
by evaluating the node G11. The children G6 and G7 
are well formed but do not satisfy any of the rules (a) to 
(f) of Figure 10. Although the node G6 is labelled as a 
finite verb and the node G7 is labelled as a noun phrase, 
when the rule (e) is applied to the nodes G6 and G7 it 



is noted that the object O of G6 is equal to index 3 where- 
as the index I of the node G7 is equal to 9. Thus, al- 
though a finite verb and a noun phrase could combine 
to form a verb phrase, the noun phrase at the node G7 
s is not the object of the finite verb and the node G11 does 
not therefore fulfil the rule (e). Thus, the node G11 re- 
mains not well formed. 

The labels of the nodes G9 and G10 are shown in 
Figure 13. 

to The transformer 1 06 thus performs a further trans- 
formation of the tree shown in Figure 11 . The nodes G4, 
G6, G7, G9, and G10 are all now well formed and the 
transformer 106 thus does not disturb them. However, 
the node G11 is not well formed and the transformer 
is therefore moves the maximal tree fragment comprising 
the node G6 and its leaf nodes to a place in the tree 
structure where there is a noun phrase of index 3 with 
which this maximal tree fragment representing a finite 
verb can possibly combine successfully. The modified 
structure is shown in Figure 14. 

The evaluator 105 evaluates the tree shown in Fig- 
ure 14 starting at the node G12. This is not well formed 
and so the evaluator evaluates the child node in the left 
branch, namely the node G13. This node is not well 
formed but is connected to the well formed nodes G9 
and G6. The rules shown in Figure 1 0 are applied to the 
labels of the nodes G9 and G6 and, in particular, the rule 
(e) is fulfilled, when the positions of the nodes G9 and 
G6 are reversed, the finite verb (node G6) and the noun 
phrase (node G9) being such that the subject of G6 is 
equal to the index of G9. As shown in Figure 15, the 
node G1 3 is thus labelled well formed as a verb phrase 
with the subject equal to 9 and the spelling equal to the 
spelling of G6 followed by the spelling of G9. 

The node G12 can now be evaluated because it is 
connected to the well formed nodes G13 and G7. The 
rule (a) is found to be fulfilled when the positions of the 
nodes G1 3 and G7 are reversed because the node G7 
represents a noun phrase whose index is equal to the 
subject of the verb phrase at the node G13. The node 
G12 is therefore labelled as a sentence having as its 
spelling the spelling of G7 followed by the spelling of 
G13. The evaluation has been successfully completed 
to give the final spelling "The dog likes the quick brown 
fox", as shown in Figure 15. 

The machine translation system thus provides the 
correct translation in a relatively small number of eval- 
uation steps, thereby reducing the processing time sub- 
stantially compared with the prior art. By preserving the 
grammatical relationships during the transformation by 
the transformer 103 and by not disturbing correctly eval- 
uated structure in the transformer 106, an efficient and 
elegant technique is provided for translating quickly and 
accurately from the source language to the target lan- 
guage. In the worst possible case, wher there is a 
number X of targ t language signs produced by the 
transformer 103, the evaluator 105 would have to per- 
form less than of the order of X 4 evaluations before find- 
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ing the correct target language sentence or giving up 
the attempt. In the case of the prior art, because the 
structure is not transformed and evaluated efficiently, in 
the worst case for X target language signs, a number of 
evaluations equal to X! (factorial X) would be required 
to find the correct target language sentence or to give 
up. For realistic values of X representing normal sen- 
tences, the difference in processing time between the 
present machine translation system and known systems 
can therefore be several orders of magnitude. For in- 
stance, in the case of failure to find a translation with ten 
target language signs, the present system would require 
less than of the order of 1 0,000 evaluations whereas the 
known system would have to perform of the order of 3.5 
million evaluations before giving up and acknowledging 
failure. Thus, with currently available data processing 
speeds, the present system can be implemented where- 
as the known system is impractical. 



Claims 

1 . A machine translation system for translating text in 
a source language to text in a target language, com- 
prising: an input interface (100) for putting text in 
the source language into the system; an analyser 
(101, 102) for analysing a grammatically complete 
section of the input text into source language signs, 
each of which has an associated label comprising 
an identifier for identifying the sign and data identi- 
fying any other sign to which the sign is grammati- 
cally related; a first transformer (103) for transform- 
ing the source language signs to target language 
signs including transforming the identifiers and the 
data; and a combiner (1 04) for combining the target 
language signs into a target language structure 
such that each target language sign is associated 
with at least one other target language sign, char- 
acterised by: an evaluator (105) for evaluating the 
target language structure so as to identify, from the 
identifiers and the data of the target language signs, 
well formed substructures and target language 
signs not forming part of a well formed substructure; 
and a second transtormer ( 1 06) for transforming the 
target language structure without dissociating from 
each other target language signs forming well 
formed substructures identified by the evaluator 
(105), the evaluator (105) and the second trans- 
former (106) alternately evaluating and transform- 
ing the target language structure. 

2. A system as claimed in Claim 1 , characterised in 
that the second transformer (106) is arranged to as- 
sociate a first well formed substructure having an 
associated label or a first target language sign not 
forming part of a well formed substructure with a 
second well formed substructure having an associ- 
ated label or a second target language sign such 



that the identifier of one of the first and second tar- 
get language signs or well formed substructures is 
included in the data of the other of the first and sec- 
ond target language signs or well formed substruc- 
5 tures. 

3. A system as claimed in Claim 2, characterised in 
that the target language structure comprises a hier- 
archical structure of nodes with the target language 

10 signs at the lowest order nodes and the second 
transformer (106) is arranged to associate the first 
target language sign or well formed substructure 
with the second target language sign or well formed 
substructure of highest nodal order. 

75 

4. A system as claimed in any one of the preceding 
claims, characterised in that the evaluator (105) is 
arranged, following a transformation of the target 
language structure by the second transformer (1 06) 

20 such that at least one well formed substructure is 
unchanged, not to re-evaluate the or each un- 
changed well formed substructure. 

5. A system as claimed in any one of the preceding 
2S claims, characterised in that the evaluator (105) is 

arranged, following transformation of the target lan- 
guage structure by the second transformer (106) 
such that at least one well formed substructure is 
changed, to re-evaluate the or each changed sub- 
so structure in respect only of change therein. 

6. A system as claimed in Claim 5, characterised in 
that the evaluator (105) is arranged to evaluate the 
target language structure in accordance with a mo- 
ss notonic grammar and, following transformation of 

the target language structure by the second trans- 
former (106) such that a third target language sign 
or well formed substructure becomes associated 
with a fourth target language sign or well formed 
40 substructure forming part of a fifth well formed sub- 
structure, to re-evaluate the fifth well formed sub- 
structure only in respect of the association between 
the third and fourth target language signs and well 
formed substructures. 

45 

7. A system as claimed in any one of the preceding 
claims, characterised in that the analyser (101, 102) 
comprises a morphological analyser (101) for ana- 
lysing the input text into source language mor- 

50 phemes and a syntactic analyser (102) for analys- 
ing the grammatical relationships between the mor- 
phemes to produce the source language signs. 

8. A system as claimed in Claim 7, characterised in 
55 that the syntactic analyser ( 1 02) is arranged to sup- 
ply a source language structure of the input text to 
the combiner (104), which is arranged to form the 
target language structure to resemble the source 
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language structure. 

9. A system as claimed in any one of the preceding 
claims, characterised in that the evaluator (105) 
comprises a further syntactic analyser for analysing $ 
the grammatical relationship between the target 
language signs in the target language structure. 
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(57) Source language text from an input interface 
(100) is broken down into source language morphemes 
by a morphological analyser (1 01). A syntactic analyser 
(102) converts the morphemes into source language 
signs labelled with identifiers and data identifying other 
signs which are grammatically related. A bilingual equiv- 
alence transformer (103) transforms the source lan- 
guage signs to target language signs which are com- 
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bined by a combiner (104) to provide a first attempt at 
a target language structure. The structure is repeatedly 
evaluated by an evaluator (104) and transformed by a 
transformer (106). The signs of well formed substruc- 
tures identified by the evaluator (104) are not dissociat- 
ed from each other by the transformer. This process 
ends when either the whole target language structure is 
evaluated as being well formed or all transformations 
have been unsuccessfully evaluated. 
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